Pipelined device and a method for executing transactions in a pipelined device

ABSTRACT

A pipelined device and method for executing transactions in a pipelined device, the method includes: setting limiter thresholds that define a maximal amount of pending transaction requests to be provided from one pipeline stage to another pipeline stage; executing an application while monitoring the performance of a device that comprises pipeline limiters; wherein the executing includes: selectively transferring transaction requests from one stage of the pipeline to another in response to the limiter thresholds, arbitrating between transaction requests at a certain pipeline stage, and executing selected transaction requests provided by the arbitrating.

FIELD OF THE INVENTION

The present invention relates to a pipelined device and for a method forexecuting transactions in a pipelined device.

BACKGROUND OF THE INVENTION

Deep pipelined devices such as but not limited to on-chip interconnects,can interface with many components. Requests to receive a service orgain access to a certain bus (also referred to as transaction requests)are usually sent to an arbitrator that can apply various arbitrationschemes in order to determine which service shall be granted or whichcomponent can gain access to a shared medium such as a shared bus.

The arbitration schemes can be responsive to the priority of therequesting component. Accordingly, more important components such asprocessors, digital signals processor and the like are associated withhigher request priority. In deep pipelines devices requests to receive aservice can propagate through many pipeline stages before getting to thearbiter. These pipelines stages actually form a request queue in whichlower priority request can be located before high priority requests.

Very deep pipelines can store many transaction requests. This deeppipeline can result in relatively long delays prior to an execution ofan urgent transaction request that was received after many less urgenttransaction requests were already sent to the pipeline. On the otherhand very shallow pipelines are characterized by lower throughputs.

There is a need to provide efficient devices and method for managingtransactions.

SUMMARY OF THE PRESENT INVENTION

A pipelined device and for a method for executing transactions in apipelined device, as described in the accompanying claims.

BRIEF DESCRIPTION OF THE DRAWINGS

The present invention will be understood and appreciated more fully fromthe following detailed description taken in conjunction with thedrawings in which:

FIG. 1A illustrates a device having priority upgrade capabilitiesaccording to an embodiment of the invention.

FIG. 1B illustrates modular components of a device having priorityupgrade capabilities, according to an embodiment of the invention;

FIG. 2A illustrates an interconnect, according to an embodiment of theinvention;

FIG. 2B illustrates an interconnect having pipeline limitingcapabilities, according to an embodiment of the invention;

FIG. 3A illustrates a device having priority upgrade capabilities,according to an embodiment of the invention;

FIG. 3B illustrates a device having a pipeline limiting capabilities,according to an embodiment of the invention;

FIG. 4 illustrates an expander, according to an embodiment of theinvention;

FIG. 5 illustrates a splitter, according to an embodiment of theinvention;

FIG. 6 illustrates multiplexer and arbiter, according to an embodimentof the invention;

FIG. 7 illustrates a clock separator, according to an embodiment of theinvention;

FIG. 8 illustrates a bus width adaptor, according to an embodiment ofthe invention;

FIG. 9 illustrates an arbitration method, according to an embodiment ofthe invention;

FIG. 10 illustrates method for priority updating, according to anembodiment of the invention; and

FIG. 11 illustrates method for pipeline limiting 1200, according to anembodiment of the invention.

DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS

The following figures illustrate exemplary embodiments of the invention.They are not intended to limit the scope of the invention but ratherassist in understanding some of the embodiments of the invention. It isfurther noted that all the figures are out of scale.

According to an embodiment of the invention the pipelined deviceincludes an interconnect. Device 10 can be a mobile device such as amobile phone, music player, audio-visual device, personal dataaccessory, laptop computer, and the like. Device 10 can also be astationary device such as a desktop computer, server, network node, andthe like.

Device 10 can include multiple interconnect building blocks such as butnot limited to expanders, arbiter and multiplexers, splitters, samplers,clock separators and bus width adaptors. Some of these building blockscan perform time based priority upgrade, can update priority in responseto a request to upgrade priority and the like. An interconnect caninclude one arbiter that is followed (usually after other stages)another arbiter.

A method for executing transactions in a pipelined device is provided.The method allows to selectively define the depth of a pipeline byselectively limiting the number of co-pending transaction requests, byone or more limiters positioned in various locations, especially betweenpipeline stages of the pipeline device but also can be positioned at theinput or output of the pipelined device.

Conveniently, the method includes: (i) setting limiter thresholds thatdefine a maximal amount of pending transaction requests to be providedfrom one pipeline stage to another pipeline stage; (ii) executing anapplication while monitoring the performance of the pipelined device,wherein the executing includes: (a) selectively transferring transactionrequests from one stage of the pipeline to another in response to thelimiter thresholds, (b) arbitrating between transaction requests at acertain pipeline stage, and (c) executing selected transaction requestsprovided by the arbitrating.

Conveniently, a pipelined device is provided. The pipelined device isadapted to transactions. The pipelined device includes: (i) an arbiteradapted to arbitrate between transaction requests, (ii) multiplelimiters, (iii) a controller adapted to set limiter thresholds thatdefine a maximal amount of pending transaction requests to be providedfrom one pipeline stage to another pipeline stage; and (iv) multiplemonitors adapted to monitor traffic that passes through various buses ofthe device while the device executes an application. Wherein a limiterthat is connected between one pipeline stage to another pipeline stageof the pipelined device is adapted to selectively transfer transactionrequests from one pipeline stage of the pipeline to another in responseto the limiter threshold, and wherein a pipeline stage of the pipelineddevice is adapted to execute selected transaction requests provided bythe arbiter.

FIG. 1A illustrates device 10 according to an embodiment of theinvention.

Each of these illustrated device can be regarded as a pipeline stage.

Device 10 includes arbiter 800 that arbitrates between transactionrequests in response to priority attributes associated with thetransaction requests.

Device 10 includes a first sequence 18 of pipeline stages 18,1-14,4 thatprecede arbiter 800 and also includes a second sequence 19 of pipelinestages 19,1-19,5 that precede arbiter 800. Arbiter 800 arbitratesbetween a transaction request that is stored at the head of firstsequence 18 and at the head of second sequence 19.

At least one pipeline stage out of first sequence is adapted to receivea request to update a priority of transaction requests stored withinfirst sequence 18 to a requested priority. The request can originatefrom a requesting unit or from a time based priority upgrade mechanism.This pipeline stage can send the request to the following stages of thepipeline. Such a pipeline stage can be an expander such as expander 600of FIG. 1B.

For each transaction request stored in first sequence 18, device isadapted to update its priority if the transaction request is priorityupgradeable and if the requested priority is higher than the currentpriority of the transaction request. The same applies for transactionrequests stored in second sequence 19.

In many cases high priority requests are stuck under lower priorityrequests. The arbiter is not award to the high priority requests thatcan eventually be served only after relatively a long time period. Byupgrading the priority of all the priority upgradeable transactionrequests the pending period of the originally high priority request willbe shortened.

TABLE 1 illustrates various priority upgrade scenarios. It is assumedthat P1<P2<P3<P4.

TABLE 1 Result of Pipeline Current Requested priority stage priorityUpgradeable? priority upgrade 18.4 P1 Y P3 P3 18.3 P4 N P3 P4 18.2 P3 YP3 P3 18.1 P1 Y P3 P3 19.5 P3 Y P4 P4 19.4 P1 Y P4 P4 19.3 P4 Y P4 P419.2 P2 N P4 P2 19.1 P3 N P4 P3

In order to add some fairness to the arbitration process the priority ofan upgradeable transaction request can be increases as its pendingperiod increases. This mechanism is referred to as time base priorityupgrade mechanism.

Those of skill in the art will appreciate that other time based prioritylevel upgrading schemes can be applied without departing from the scopeof the invention.

According to an embodiment of the invention a predefined timingthreshold T1 is defined. When half of T1 passes the priority level isupgraded. When another fourth of T1 passes the priority level is furtherupgraded. When another eighth of T1 passes the priority level if furtherupgraded. This mechanism can be applied by using a multiplexer that hasmultiple inputs and one output. The output is connected to a priorityupgrade period counter. When the counter is reset (or rolls over orreaches a predefined value) a priority upgrade occurs. The differentinputs of the multiplexer are connected to different portions (offset byone bit) of a register that stores T1. The first input receives thewhole T1. The second input receives T1 without its least significant bit(which equals T1/2). The third input receives T1 without its two leastsignificant bits (T1/4), and so on. The multiplexer is controlled by aselection unit that alters its selection each time the priority upgradeperiod counter is reset.

FIG. 1B illustrates modular components 300-800 of device 10 according toan embodiment of the invention.

Conveniently, the modular components include: (i) expander 600, (ii)arbiter and multiplexer 800, (iii) splitter 500, (iv) sampler 700, (v)clock separator 300, and (vi) bus width adaptor 400.

It is noted that an interconnect does not necessarily include all thesecomponents. It is further noted that these components can also be usedas stand-alone components in the integrated circuit. Those of skill inthe art will appreciate that an inter connect can include multiplestages of these modular components.

According to an embodiment of the invention each of these modularcomponents building blocks is using the same standard interface, such asto facilitate a glue-less connection between each of these components.

According to another embodiment of the invention each modular componentscan alter various attributes of various pending transaction requests.For example, various transaction requests can be associated with anarbitration priority that can be upgraded. Each modular component canupgrade the priority of the transaction request it stores, either inresponse to a request from another component or even apply a time basedpriority upgrade scheme.

Conveniently, at least one modular component can receive and generatesignals that represent the beginning and/or end of the following phases:request and address phase, a data phase and an end of transaction phase.

Conveniently, at least one modular component can store one or moretransaction request and also support multiple pending transactionrequests that are stored in other components. For example, the expander600 can receive up to sixteen transaction requests that were notfollowed by data phases and/or end of transaction phases, although itcan store a more limited amount of requests.

Expander 600 allows a single master with a point-to-point interface toaccess a plurality of slaves, each with a point-to-point interface. Theslave selection is based upon address decoding. Arbiter and multiplexer800 allows a plurality of masters with a point-to-point interface toaccess a single slave with a point-to-point interface.

Splitter 500 allows a single master with a point-to-point interface toaccess a single slave with a point-to-point interface. The splitter 500optimizes transactions according to the capabilities of the slave.

Sampler 700 allows a single master with a point-to-point interface toaccess a single slave with a point-to-point interface. It samples thetransactions generated towards the slave. It is noted that the sampler700 as well as other components can include one or more samplingcircuits and optionally one or more bypassing circuit.

Clock separator 300 allows a single master with a point-to-pointinterface to access a single slave with a point-to-point interface. Themaster may operate in one clock domain while the slave operates inanother clock domain. Bus width adaptor 400 allows a single master witha point-to-point interface to access a single slave with apoint-to-point interface. The master's data bus width is different thanthe slave's data bus width.

Conveniently, each modular component out of components 200-800 includesan input interface and an output interface. For convenience ofexplanation these interfaces were illustrated only in FIG. 7 (inputinterface 305 and output interface 315) and in FIG. 8 (input interface205 and output interface 215).

According to an embodiment of the invention multiple modular componentsout of components 200-800 includes a sampling circuit that can beselectively bypassed by a bypass circuit. For convenience of explanationonly FIG. 4 illustrates a sampling circuit 610 and a bypass circuit 612.

FIG. 2A illustrates device 11 having priority upgrade capabilities,according to an embodiment of the invention.

Device 10 includes interconnect 100. Interconnect 100 connects between Mmasters and S slaves. M and S are positive integers. The M masters areconnected to M input ports 102(1)-102(M) while the S slaves areconnected to output ports 101(1)-101(S). These input and output portscan support bi-directional traffic between masters and slaves. They arereferred to input and output ports for convenience only. Conveniently,the input ports 102(1)1-102(M) are the input interfaces of the expanders600(1)-600(M) and the output ports are the output interfaces ofsplitters 500(1)-500(S).

Interconnect 100 includes M expanders 600(1)-600(M), S arbiters andmultiplexers 800(1)-800(S) and S splitters 500(1)-500(S). Each expanderincludes a single input port and S outputs, whereas different outputsare connected to different arbiter and multiplexers.

Each arbiter and multiplexer 800 has a single output (that is connectedto a single splitter) and M inputs, whereas different inputs areconnected to different expanders 600. Each splitter 500 is connected toa slave.

It is noted that interconnect 100 can have different configuration thanthe configuration illustrated in FIG. 2. For example, it may includemultiple samplers 700, clock separators 300 and bus width adaptors 400.These components can be required in order to support interconnects toslaves and masters that have different bus widths and operate indifferent frequencies.

Each splitter 500 is dedicated to a single slave. This splitter 500 canbe programmable to optimize the transactions with that slave.Conveniently, each splitter 500 is programmed according to the slavemaximal burst size, alignment and critical-word-first (wrap)capabilities.

Interconnect 100 can operate as a low latency interconnect by utilizingthe minimal amount of sampling circuits and bypassing other samplingcircuits. It can also operate as latency insensitive interconnect.

Conveniently, 100 is a non-blocking full fabric switch that supportsper-slave arbitration, thus it enables maximal data bus utilizationtowards each of the slaves.

Each modular components of the interconnect 100 has a standard,point-to-point, high performance interface. Each master and slave isinterfaced via that interface. This interface uses a three-phaseprotocol. The protocol includes a request and address phase, a dataphase and an end of transaction phase. Each of these phases is grantedindependently. The protocol defines parking grant for the request andaddress phase. The data phase and the end of transaction phase areconveniently granted according to the fullness of the buffers within theinterconnect 100. The request is also referred to as transactionrequest. The end of transaction phase conveniently includes sending anend of transaction (EOT) indication.

For example, a master can send a write transaction request to anexpander 600(1). The expander 600(1) can store up to three writetransaction requests, but can receive up till sixteen write transactionrequests, as multiple transaction requests are stored in othercomponents of the interconnect. Thus, if it received the sixteenth writetransaction request (without receiving any EOT or EOD signal from themaster) it sends a busy signal to the master that should be aware thatit couldn't send the seventeenth transaction request.

On the other hand, when the expander 600(1) stores the transactionrequest it sends an acknowledge to the master that can enter the dataphase by sending data to the expander 600(1). Once the expander 600(1)ends to receive the whole data it sends a EOD signal to the master thatcan then end the transaction.

The expander 600(1) sends the transaction request to the appropriatearbiter and multiplexer. When the transaction request wins thearbitration and when the multiplexer and arbiter receives a requestacknowledge signal then expander 600(1) sends the data it received tothe splitter. Once the transmission ends the expander 600(1) enters theend of transaction phase. The splitter then executes the three-stagedprotocol with the target slave.

Interconnect 100 can use multiple sampling circuits, in order tointerconnect between high frequency masters and remote slaves. Theamount of sampling units affects the depth of the pipeline although thedepth of the pipeline can be also responsive to other parameters such asbut not limited to the buffering capabilities of the interconnect 100,and the like. The amount of sampling circuits can be increased by addingsamplers, such as sampler 700 to interconnect 100, and/or by bypassingor not-bypassing the sampling circuitries within the expanders, arbitersand multiplexers and the splitters. For example, the expander 600includes a main sampler 640 as well as an address and attribute sampler610 that can be bypassed.

Conveniently, interconnect 100 can terminate write transaction locallyor let it be terminated by the slave. The write termination capabilityis enables by an attribute that is associated with the transaction. Inorder to provide data coherency the slave should terminate the writetransaction, otherwise the interconnect 100 can terminate thetransaction locally.

Conveniently, the interconnect 100, and especially each arbiter andmultiplexer 800 implements an arbitration scheme that can becharacterized by the following characteristics: multiple (such as four)quality-of-service (or priority) levels, a priority upgrade mechanism,priority mapping, pseudo round robin arbitration, time based prioritylevel upgrade, priority masking, weighted arbitration, and late decisionarbitration.

The priority level is an attribute of each transaction. The arbiterincludes a dedicated arbiter circuit per priority level. The priorityupgrade mechanism allows a master (or another component) to upgrade apriority level of a pending transaction, based upon information that isacquired after the generation of that transaction request. The upgradeinvolves altering the priority attribute associated with the transactionrequest. The update can be implemented by the various components of theinterconnect.

According to an embodiment of the invention some transaction requestscan be labeled as non- upgradeable, while other transaction requests canbe labeled as upgradeable. Non-upgradeable transaction requests are notupgraded during priority upgrade sessions.

Priority mapping allows to map master priority levels to slave prioritylevels or to a common priority level mapping. Pseudo round-robinarbitration involves storing the last arbitration winner and scanning atransaction request vector from the last arbitration winner until acurrent transaction request is detected.

Time based priority level upgrading includes updating the priority levelof pending transaction requests in response to the time they arepending. Conveniently, this feature reduces the probability ofstarvation. According to an embodiment of the invention a predefinedtiming threshold T1 is defined. When half of T1 passes the prioritylevel is upgraded. When another fourth of T1 passes the priority levelis further upgraded. When another eighth of T1 passes the priority levelif further upgraded. Those of skill in the art will appreciate thatother time based priority level upgrading schemes can be applied withoutdeparting from the scope of the invention.

Priority masking includes selectively masking various request ofpredefined priorities, during predefined time slots. Conveniently,during one time slot the highest priority transaction requests aremasked, during another timeslot the highest and the second highestpriority transactions requests are blocked, and so on. Conveniently,some transaction requests cannot be blocked, and during various timeslots all the transaction requests are allowed. This guarantees aminimal arbitration winning slots for transactions with lowerpriorities, thus resolves potential starvation problems.

Weighted arbitration includes allowing an arbitration winner toparticipate in multiple consecutive transactions (transaction sequence)after winning an arbitration session. The weight can represent theamount of transactions that can be executed by an arbitration winner.Conveniently, if during the transactions sequence a higher prioritytransaction request wins the arbitration scheme then the transactionsequence stops.

Late decision arbitration includes determining a new arbitration winnersubstantially at the end of a currently executed transaction orsubstantially after a delay corresponding to the length of the currenttransaction ends.

Interconnect 100 is an ordered interconnect thus is does not requirearea-consuming re-order buffers. Conveniently, interconnect 100 issynthesized within a bounded centralized area generating star topology.This synthesis may require to add a small amount of buffers betweeninterconnect 100 and the master and slaver that are connected to it.Nevertheless, this synthesis dramatically reduces the complexity ofrouting and further shortens the design and verification period.

Interconnect 100 has a relatively small area resulting in relatively lowstatic power consumption. In addition, by applying power gatingtechniques the power consumption of interconnect 100 is further reduced.

Interconnect 100 includes multiple point-to-point interfaces (alsoreferred to ports) that inherently implement sampling. In additioninterconnect 100 includes multiple sampling circuits that can beselectively bypassed, thus preventing low frequency filtering problemsarising from long paths.

Interconnect 100 supports an ordered transaction protocol. In addition,to simplify implementation and eliminate reorder buffers, interconnect100 does not generate transaction towards a new slave till all pendingtransaction towards that slave are completed. This behavior ensures thatthe order of transaction completion is the same of the order oftransaction initiated. As a result the actual latency towards a certainslave may increase due to additional stall cycles.

According to another embodiment of the invention interconnect 100includes a relatively limited reorder mechanism that does not require tostall a transaction towards one slave until a previous transactiontowards that slave is completed.

FIG. 2B illustrates device 12 having pipeline limiting capabilities,according to an embodiment of the invention.

Device 12 of FIG. 2B differs from device 11 of FIG. 2A by havingmultiple limiters 906(1)-906(M) and 905(1)-905(S).

A limiter is places between a pair of modular components, such asbetween an expander and an arbiter and multiplexer or between an arbiterand multiplexer and between a splitter. The limiter participates in thehandshake process between a pair of surrounding modular components. Itreceives a request to perform a transaction from one modular component.If the number of pending transaction request does not exceed a limiterthreshold than the transaction request is sent to the second modularcomponent. If, on the other hand, the number of pending transactionrequests exceeds to the limiter threshold then the limiter does not passthe transaction request to the second modular component.

The limiter thresholds can be determined in view of various parametersincluding- the master or slave component that sends or receives thetransaction requests, the priority of transaction requests, requiredthroughput of the interconnect, latency parameters and the like.

For example, limiters that are connected to lower priority masters thatare characterized by relatively long transactions can be set to lowerlimiter threshold.

Yet for another example, the limiter threshold can be set in view ofrequired device throughput or latency. In interconnects that servicemany masters and slaves the optimal set of limiter thresholds can behard to predict. In order to assist in the limiter threshold definitionthe behavior of the device can be monitored, while different limiterthresholds are set.

FIG. 2B illustrates a set of monitors 956(1)-956(M) and 955(1)-955(S)that are connected to controller 98. Monitors 956(1)-956(M) areconnected to the inputs of expanders 600(1)-600(M) while monitors955(1)-955(S) are connected to the outputs of splitters 500(1)-500(S).It is noted that limiters can be placed near the monitors and viceverse.

These monitors can monitor the traffic that passes by variousjunctions/buses and report their findings to a controller 98 that candetermine the limiter thresholds.

It is noted that if the behavior of the device can depend upon theapplication that is being executed by the device. The limiter thresholdscan also be set in response to the application that is being executed bythe device.

Referring back to FIG. 2A, limiter 906(m, s) is positioned between them'th expander 600(m) and the s'th arbiter and multiplexer 800(s),wherein index m ranges between 1 and M and index s ranges between 1 andS. Limiter 905(s) is positioned between the s'th arbiter and multiplexer800(s) and the s'th splitter 500(s).

It is noted that limiters can be places between some modular components,while various other connections between modular components can be leftwithout a limiter. For example, a limiter can be placed after expandersthat are connected to low priority master components.

It is noted that at least some limiters can includes a fixed limiterthreshold and are not connected to a controller such as controller 98.

FIG. 3A illustrates device 13 having priority upgrade capabilities,according to an embodiment of the invention.

Device 13 includes integrated circuit 11 that in turn includes a groupof interconnects that includes interconnects 101, 102 and 103. The usageof multiple interconnects can be required when certain components shouldbe connected by a low latency interconnect. If a single interconnectcannot provide such low latency then the components of the integratedcircuit can be grouped to multiple groups. At least one group includesmultiple components that are physically close to each other that areinterconnected by a low latency interconnect. Interconnects 101 and 102are low latency interconnects while interconnect 103 is a latencyinsensitive interconnect. Conveniently, more sampling circuits arebypassed at interconnects 101 and 102 in comparison to interconnect 103.

Interconnect 101 interconnects a first group of components that includesprocessors 110 and 112 and shared on-chip memory 120. Interconnect 101is also connected to interconnect 102 and interconnect 103. The twoprocessors 110 and 112 are the masters of this interconnect.

Interconnect 102 interconnects a second group of components thatincludes processors 114 and 118 and shared on-chip memory 124. The twoprocessors 114 and 118 are the masters of this interconnect.

Interconnect 103 interconnects a third group of components that includesDMA 122, external host interface (I/F) 116, peripherals 130, 132 and 134and a memory controller 136. The memory controller 136 is connected toan off chip memory 190. Peripherals 130, 132 and 134 and memorycontroller 136 are the slaves of interconnect 103.

FIG. 3B illustrates device 14 having pipeline limiting capabilities,according to an embodiment of the invention.

Device 14 differs from device 13 of FIG. 3A by including monitors951(1)-951(6) and limiters 901(1)-901(6).

Monitor 951(1) and limiter 901(1) are connected between processor 110and interconnect 101. Monitor 951(2) and limiter 901(2) are connectedbetween processor 112 and interconnect 101. Monitor 951(3) and limiter901(3) are connected between processor 114 and interconnect 102. Monitor951(4) and limiter 901(4) are connected between processor 118 andinterconnect 102. Monitor 951(5) and limiter 901(5) are connectedbetween interconnect 101 and DMA controller 122. Monitor 951(6) andlimiter 901(6) are connected between interconnect 101 and memorycontroller 136.

These monitors and limiters can connected to controller 99 that isadapted to evaluate the performance of device 14 and in responsedetermine limiter thresholds.

FIG. 4 illustrates an expander 600, according to an embodiment of theinvention.

Expander 600 includes input port 102, multiple (such S) output ports601-603, an address and attribute sampler 610, an address and prioritytranslation unit 620, slave decoder 630, main sampler 640,de-multiplexer 650 and control unit 660.

The address and attribute sampler 610 can be bypassed. If it is notbypassed it samples the address and attributes lines.

Expander 600 supports priority upgrades of transaction requests that arestored in it. Thus, a priority attribute of a stored transaction requestcan be updated. The updated priority is taken into account by arbitersand multiplexers 800(1)-800(S). The upgrade can usually take placebefore the slave that is the target of the transaction acknowledges thetransaction request.

The main sampler 640 includes a double buffer for all lines from themaster to the slave (including address, write data and attribute lines).The double buffer allows to sample address, write data and attributelines of a certain transaction before another transaction ends. The mainsampler 640 provides a single buffer for the lines from the slave to themaster (including, for example, read data).

The main sampler 640 facilitates transaction priority upgrading and alsotime based priority upgrading. Time based priority upgrade involvesincreasing a priority of a pending transport request that is pending formore than a certain time threshold. Conveniently, multiple transactionpriority upgrades can occur if the pending period exceed multiple timethresholds.

The priority upgrading is conveniently initiated by a master andincludes upgrading the priority of a certain pending transaction request(by altering the priority attribute). Conveniently, the priorityattribute of other transaction requests that precede that certaintransaction requests are also upgraded. This feature allows to maintainthe order of requests while increasing the probability that a certainpipelines transaction request will be serviced before lower prioritytransaction requests. Conveniently, the controller 660 can control thispriority upgrade, but this is not necessarily so.

The address and priority translation unit 620 translates the upper bitsof the address according to a predefined values. The prioritytranslation involves translating master transaction priority levels to aslave transaction priority levels to common priorities levels. Thetranslation can involve using a predefined transaction priority lookuptable.

The slave decoder 630 receives an address over address lines anddetermines whether the transaction is aimed to a slave out of the Sslaves that are connected to the interconnect or if the address iserroneous, based upon a predefined address range that is associated witheach slave.

According to one embodiment of the invention the address ranges that areallocated to each slave are unique so that only one slave can beselected. According to another embodiment of the invention the addressranges overlap but additional information such as slave priority areprovided in order to resolve multiple matches between an input addressand different address ranges.

Conveniently, the address ranges are stored in address registers locatedwithin the expander 600. Typically one address register stores the startaddress of the address range while the other address register stores theend address of the address range or an offset from the start address.

The de-multiplexer 650 sends data, address and attribute signals to thearbiter and multiplexer 800 that is connected, via a splitter 500, tothe target slave.

The control unit 660 control the operation of the address and attributesampler 610, address and priority translation unit 620, slave decoder630, main sampler 640 and the de-multiplexer 650. The control unit 660can control power-gating techniques, and block transaction requestsaimed to a certain target slave until a current transaction that isaimed to that certain target slave is completed. The transactioncompletion can be indicated by an end of transaction signal that is sentfrom the target slave.

Conveniently, the control unit 660 includes an access tracker, requestgenerator, end of data indication generator and a transaction typetracking circuitry. The access tracker tracks transactions that did notend. The request generator sends transaction request signals towardstarget slaves. The end of data indication generator sends EOD indicationtowards the master. The transaction type tracking circuitry storesinformation that indicates the type (read, write, error, idle) oftransactions that are currently during their data phase.

FIG. 5 illustrates a splitter 500, according to an embodiment of theinvention.

Splitter 500 is adapted to receive data transactions from the master andconvert them to one or more transactions towards the slave, and viceverse. The splitter 500 stores various slave transaction characteristics(also referred to as attributes), such as maximal burst size, data burstalignment, wrap size, and the like. It then defines the translationstowards the slave in response to these attributes. The splitter 500 alsoapplies the three-stage protocol towards the slave and towards themaster. For example, if a master sends a data burst of 128 bits and theslave can receive data bursts of 32 bits then the splitter 500 convertsthis data burst to four slave data bursts.

The splitter 500 can be configured to be responsive to the slavetransaction attributes (optimize mode) or as a sampling stage (samplermode). In the sampler mode the splitter 500 only samples signals andsends them towards the slave. It is noted that the bus width of theinput port and output port of the splitter 500 are the same, thussampling mode can be easily executed.

The splitter 500 includes a data unit 510, a respond unit 520, a requestunit 530 and a control/debug unit 540. The control/debug unit 540controls the splitter and is also used during debug mode.

It is noted that other modular component of interconnect 100 includes adebug unit and/or a combined debug and control unit but for simplicityof explanation only FIG. 5 illustrates a debug unit.

The data unit 510 includes buffers that enable to exchange data betweenthe master and slave. The respond unit 520 manages the end oftransmission signal and the end of data signals. The request unit 530performs the access optimization and manages other control signals.

The splitter 500 can store multiple transaction requests, and includesone sampling circuit as well as an optional sampling circuit that can bebypassed. The second sampling circuit is located within the request unit530. Conveniently, two sampling circuits are activated when the splitter500 wrap is enabled, or when the splitter 500 operates in an optimizemode.

Conveniently, when a write transaction occurs, the master sends a databurst to the splitter 500. The master also sends information reflectingthe size of the burst, so that the splitter 500 can send an EOD signaltowards the master once it received the whole data burst and themaster-splitter data phase ends. It can also send an EOT signal once themaster-splitter end of transaction phase ends. The EOD and EOT can besent even if the data was not sent (or was not completely sent) to theslave. The splitter 500 sends data to the slave in one or more databeats, and used the three-stage protocol. The slave sends to thesplitter 500 EOD and EOT signals once the splitter-slave data phase andthe splitter-slave transaction end phase are completed.

According to an embodiment of the invention the splitter 500 can alsosupport transaction priority upgrading and also time based priorityupgrading. These features can be required if the splitter 500 isfollowed by an arbiter.

FIG. 6 illustrates multiplexer and arbiter 800, according to anembodiment of the invention.

Multiplexer and arbiter 800 includes multiple (such as M) input ports801-803, output port output ports 812, an atomic stall unit 810,multiplexer 820, arbiter 830 and sampler 840. The atomic stall unit 810receives transaction requests from various masters that are aimed to thesame slave. Sampler 640 samples the arbitration result. It is connectedbetween the multiplexer 820 and the arbiter 830.

The arbiter 830 receives the transaction requests from the atomic stallunit 810, master arbitration priority and master weights, a latearbitration control signal, and provides to the multiplexer 820 thearbitration winner and an indication that a transaction starts. Thetransaction start indication is responsive to a transactionacknowledgement signal sent from the splitter. The multiplexer 820 alsoreceives the transaction requests and in response to the control signalfrom the arbiter 830 selects one of the pending transaction requests tobe outputted to the splitter 500.

The arbiter 830 includes an arbiter engine 832, a request organizer 834and a request generator 836.

The request organizer 834 receives the transaction requests and theirpriority level and generates multiple request vectors, each vectorrepresents the transaction requests that belong to a certain prioritylevel. Each vector indicates the masters that sent pending transactionrequests.

The request generator 836 includes a masking unit 837 that selectivelymasks various transaction request of predefined priorities, duringpredefined time slots. For example, assuming that four priority levelsexist, and that sixteen timeslots are defined. During two time slots thehighest priority transaction requests are masked and the correspondingrequest vector is null. During two other time slots the two highestpriority transaction requests are masked and the two correspondingrequest vectors are null. During one time slot only the lowest prioritylevel transaction requests are enabled and during the other time slotsall the transaction requests are unmasked.

The request generator 836 also applies the weighted arbitration and thelate decision arbitration, by sending to the arbiter engine 832 timingsignals that indicate when to perform an arbitration cycle. For example,the request generator can receive an indication about the size of a databurst and the size of the data beat and determine when to trigger thenext arbitration cycle. The request generator 836 is aware of thepriorities of the pending transaction requests and can request anarbitration cycle if a higher priority request has arrived during a longtransaction of a lower priority transaction request.

The request generator 826 also sends control signals such as masterrequest signal and slave acknowledge signal in order to implement thethree phase protocol.

The arbiter engine 832 includes multiple arbitration circuits, eachassociated with transaction requests that belong to the same prioritylevel. The arbitration winner is the highest unmasked transactionrequest that won an arbitration cycle within the arbitration circuit.

The arbiter engine 832 receives multiple request vectors, each vectorrepresents the transaction requests that belong to a certain prioritylevel. Each vector indicates the masters that sent pending transactionrequests. The arbiter engine 832 applies a pseudo round robinarbitration scheme that takes into account only the winner of the lastarbitration cycle.

Those of skill in the art will appreciate that other arbitrationschemes, including well know arbitration schemes can be applied.

FIG. 7 illustrates a clock separator 300, according to an embodiment ofthe invention.

Clock separator 300 supports priority upgrading and also the three-stageprotocol. It includes an input and output interfaces as well as controlpath 310, data path 320 and a controller 330. The controller 330controls the operation of the clock separator while the control path 30is used to propagate transaction requests, control signals andattributes. These signals can include EOT signal, EOD signal,acknowledgement signals, transaction request signals and the like.

The controller 330 can receive indications about the mode of operationof the clock separator and control the clock separator 300 accordingly.For example, the clock separator can operate in a bypass mode duringwhich the input clock frequency and the output clock frequency are thesame, in various modes in which there is a predefined relationshipbetween the input and output clocks and the like.

The data path 320 includes two sampling circuits for write operationsand one sampling circuit for read operations. The data path 320 usuallyincludes a buffer for write operations and a buffer for read operations.The buffering allows to compensate for differences between the input andoutput clock frequencies.

The dashed vertical line 301 illustrates that the clock separator 300components operate at an input frequency domain and an output frequencydomain. It is noted that the frequencies can differ from each other butthis is not necessarily so. The clock separator 300 can be used tosynchronize between input and output clocks, reduce skew and/or jitterand the like.

FIG. 8 illustrates a bus width adaptor 400, according to an embodimentof the invention.

The bus width adaptor 400 supports priority upgrading and also thethree-stage protocol. It includes an input and output interfaces as wellas control path 410, data path 420 and a controller 430. The controller430 controls the operation of the bus width adaptor 400 while thecontrol path 410 is used to propagate transaction requests, controlsignals and attributes. These signals can include EOT signal, EODsignal, acknowledgement signals, transaction request signals and thelike.

The controller 430 can receive indications about the width of thedifferent buses, alignment of data and timing parameters and control thebus width adaptor 400 accordingly. The data path 420 includes twosampling circuits for write operations and one sampling circuit for readoperations. The data path 420 usually includes a buffer for writeoperations and a buffer for read operations. The buffering allows tocompensate for differences between the input and output bus widths.

FIG. 9 illustrates an arbitration method 1100, according to anembodiment of the invention.

The arbitration method 1100 starts by stage 1110 of receiving at leastone transaction request associated with at least one master, whereas allthe transaction requests are associated with the same slave. Eachtransaction request is associated with a transaction request priorityand a transaction request weight.

Stage 1110 is followed by stage 1120 of selectively masking thetransaction requests. The selective masking can be applied in varioustime slots and can mask transaction requests of one or more priority,especially the higher priorities.

Stage 1120 is followed by stage 1130 of determining when to perform oneor more arbitration cycles. The determination can be responsive to thelength of a current transaction. Conveniently, the arbitration cycle isexecuted near the end of a current transaction. According to anembodiment of the invention there can be a time gap between theselection of an arbitration winner and the beginning of the data phase.This time gap usually occurs in read transaction, although this is notnecessarily so. In write transactions the data to be transferred duringthe data phase is usually stored within interconnect when thearbitration takes place. In read transactions the data is usually storedwithin the slave when the arbitration cycle occurs. Thus, instead ofwaiting to the end of the data transfer in order to initiate the nextarbitration cycle, the arbiter calculates the length of the currentlyapproved data transfer and starts the next arbitration cycle after adelay that corresponds to that length.

Stage 1130 is followed by stage 1140 of performing, for each prioritylevel, an arbitration sequence between unmasked transaction requests.Conveniently, each arbitration cycle involves applying a pseudo roundrobin arbitration scheme. Stage 1140 provides an arbitration winner andalso include calculating the amount of data beats that can betransferred by the winner.

Stage 1140 is followed by stage 1150 of providing an indication aboutthe arbitration winner. Stage 1150 can include determining the number oftransactions that can be consecutively conducted by the arbitrationwinner. Said determination is usually responsive to the weight of thetransaction request.

Stage 1150 is followed by stage 1160 of determining when to perform thenext arbitration cycle and jumping to stage 1110. It is noted that ifstage 1110 is preceded by stage 1160 then stage 1130 can be skipped. Itis noted that the even of a certain master won an arbitration cycle andis in the middle of a sequence of transactions then the sequence can bestopped if a higher priority transaction request won an arbitrationcycle.

Method 1100 also includes stage 1115 of updating the priority level ofpending transaction requests. Stage 1115 can be executed during theexecution of other stages of method 1100. Conveniently, a priorityupdate of a certain transaction request is blocked once the transactionrequest wins the arbitration, but this is not necessarily so. Stage 1115can be time based and/or can be initiated by a master. The priorityupgrade can include upgrading the priorities of transaction requeststhat precede the certain transaction requests, especially thosetransaction requests that are stored at the same queue as the certaintransaction request.

According to an embodiment of the invention the arbitration scheme isapplied by a multiplexer and arbiter that participates in a three-stagecommunication protocol. Conveniently, the arbiter and multiplexer is amodular component that can be connected to other modular components suchas to form an interconnect.

FIG. 10 illustrates method for priority updating 1000, according to anembodiment of the invention.

Stage 1115 of method 1100 can include at least some of the stages ofmethod 1000.

Method 1000 starts by stage 1010 of receiving transaction requests andpropagating the transaction requests through a first sequence ofpipeline stages.

Stage 1010 is followed by stage 1030 of receiving a request to update toa requested priority, priorities of transaction requests stored within afirst sequence of pipeline stages that precede an arbiter.

Stage 1030 is followed by stages 1050 and 1060. Stage 1050 includesupdating a priority level of a transaction request stored in the firstsequence to the requested priority if the transaction request ispriority upgradeable and if the requested priority is higher that acurrent priority of the transaction request.

Conveniently, stage 1050 includes checking a priority level attributeand a priority upgradeability attribute associated with the transactionrequest.

Conveniently, stage 1050 includes updating priorities of transactionrequests stored within multiple modular components (such as modularcomponents 500, 600, 700, 800 of FIGS. 1A, 2A, 2B, 3-8) that are adaptedto support a certain point-to-point protocol.

Stage 1060 includes performing a priority level upgrade of a priorityupgradeable transaction request in response to a time period thepriority upgradeable transaction request is pending.

Conveniently, during repetitions of stages 1030-1080 stage 1060repetitively increments the priority level of upgradeable transactionrequests wherein a time difference between consecutive priority upgradesis inversely proportional to a number of priority increases. Forexample, if the first time base priority upgrade occurs after a firstpending period then the second time base priority upgrade occurs after asecond pending period that is shorter than the first pending period.

Stage 1050 and 1060 are followed by optional stage 1070 of selectivelymasking transactions requests of various priorities before anarbitration cycle initiates.

Stage 1070 is followed by stage 1080 of arbitrating between transactionrequests in response to priority attributes associated with thetransaction requests. Stage 1080 is followed by stage 1010.

FIG. 11 illustrates method for pipeline limiting 1200, according to anembodiment of the invention.

Method 1200 starts by stage 1210 of setting limiter thresholds thatdefine a maximal amount of pending transaction requests to be providedfrom one pipeline stage to another pipeline stage. Conveniently, thepipeline stages include modular components such as but not limited toexpanders, arbiters and multiplexers, splitters, samplers, clockseparators, but width adaptors and the like.

A limiter threshold is associated with one limiter, although this is notnecessarily so. A limiter threshold can be set in response to thecomponents connected to a limiter, to an expected program or applicationexecuted by a device, to previous performances of a device and toprevious limiter thresholds, to priority of transaction requests,required throughput of the interconnect, latency parameters and thelike.

Stage 1210 is followed by stage 1220 of executing an application thatinvolves selectively transferring transaction requests from one stage ofthe pipeline to another in response to the limiter thresholds,arbitrating between transferred transaction requests, executing selectedtransaction requests (selected by the arbitration), while monitoring theperformance of a device that comprises pipeline limiters.

The application can be executed by a device such as interconnects100-104 of previous figures.

Stage 1220 can start by stage 1221 of receiving, by a first limiter, arequest to perform a transaction from one pipeline stage to anotherpipeline stages.

Stage 1221 is followed by stage 1222 of selectively transferringtransaction requests from one stage of the pipeline to another inresponse to the limiter thresholds, wherein stage 1222 may includedetermining if the number of pending transaction requests does notexceed a limiter threshold than stage 1220 includes sending thetransaction request to the other pipeline stage, else delaying theprovision of the transaction request to the other pipeline stage.

Stage 1222 is followed by stage 1226 of arbitrating between transactionrequests at a certain pipeline stage.

Stage 1226 is followed by stage 1228 of executing selected transactionrequests provided by the arbitrating.

Stage 1220 can also include stage 1227 of monitoring the traffic thatpasses by various junctions/buses and stage 1229 of reporting thesefindings to a controller that can determine the limiter thresholds.

Conveniently, stage 1220 is followed by stage 1230 of comparing theperformance of the device to required performances targets.

Stage 1230 is followed by stage 1240 of determining whether to alter thelimiter thresholds and jump to stage 1250 of altering the limiterthresholds in response to the comparison. Stage 1250 is followed bystage 1220.

Variations, modifications, and other implementations of what isdescribed herein will occur to those of ordinary skill in the artwithout departing from the spirit and the scope of the invention asclaimed. Accordingly, the invention is to be defined not by thepreceding illustrative description but instead by the spirit and scopeof the following claims.

1. A method for executing transactions in a pipelined device, the methodcomprises: arbitrating between transaction requests at a certainpipeline stage; setting limiter thresholds that define a maximal amountof pending transaction requests to be provided from one pipeline stageto another pipeline stage; executing an application while monitoring theperformance of a device that comprises pipeline limiters; wherein theexecuting comprises: selectively transferring transaction requests fromone stage of the pipeline to another in response to the limiterthresholds, and executing selected transaction requests provided by thearbitrating.
 2. The method according to claim 1 wherein the settingcomprises setting a limiter threshold per limiter.
 3. The methodaccording to claim 1 wherein the setting comprises setting a limiterthreshold in response to an identity of components coupled to a limiterassociated with the limiter threshold.
 4. The method according to claim1 wherein the setting comprises setting a limiter threshold in responseto a program or an application executed by a device.
 5. The methodaccording to claim 1 wherein the setting comprises setting a limiterthreshold in response a priority of transaction requests.
 6. The methodaccording to claim 1 wherein the executing is followed by comparing theperformance of the pipelined device to required performances targets anddetermining whether to alter the limiter thresholds in response to thecomparison.
 7. A pipelined device adapted to execute transactions, thedevice comprises: an arbiter adapted to arbitrate between transactionrequests; multiple limiters a controller adapted to set limiterthresholds that define a maximal amount of pending transaction requeststo be provided from one pipeline stage to another pipeline stage; andmultiple monitors adapted to monitoring traffic passes through variousbuses of the device while device executes an application; wherein alimiter out of the multiple limiters that is coupled between onepipeline stage to another pipeline stage of the pipelined device isadapted to selectively transfer transaction requests from one pipelinestage of the pipeline to another in response to the limiter threshold,and wherein a pipeline stage of the pipelined device is adapted toexecute selected transaction requests provided by the arbiter.
 8. Thepipelined device according to claim 7 wherein the controller is adaptedto set a limiter threshold per limiter.
 9. The pipelined deviceaccording to claim 7 wherein the controller is adapted to set a limiterthreshold in response to an identity of components coupled to a limiterassociated with the limiter threshold.
 10. The pipelined deviceaccording to claim 7 wherein the controller is adapted to set a limiterthreshold in response to a program or an application executed by adevice.
 11. The pipelined device according to claim 7 wherein thecontroller is adapted to set a limiter threshold in response a priorityof transaction requests.
 12. The pipelined device according to claim 7wherein the controller is adapted to compare a performance of thepipelined device to required performances targets and determine whetherto alter the limiter thresholds in response to the comparison.
 13. Thepipelined device according to claim 7 wherein the pipelines stagescomprise expanders, whereas different expanders are coupled to differentmasters, and wherein each expander is coupled in parallel to S arbitersand multiplexers.
 14. The pipelined device according to claim 7 whereinthe pipelines stages comprise S splitters, wherein each splitter isadapted to optimize transactions towards a slave associated with thesplitter.
 15. The pipelined device according to claim 7 wherein thepipelines stages are adapted to alter arbitration priority indicationsof pending transaction requests.
 16. The method according to claim 2wherein the setting comprises setting a limiter threshold in response toan identity of components coupled to a limiter associated with thelimiter threshold.
 17. The method according to claim 2 wherein thesetting comprises setting a limiter threshold in response to a programor an application executed by a device.
 18. The method according toclaim 2 wherein the setting comprises setting a limiter threshold inresponse a priority of transaction requests.
 19. The pipelined deviceaccording to claim 8 wherein the controller is adapted to set a limiterthreshold in response to an identity of components coupled to a limiterassociated with the limiter threshold.
 20. The pipelined deviceaccording to claim 8 wherein the controller is adapted to set a limiterthreshold in response a priority of transaction requests.